Merge Join
The Join node is used to create joins between 2 tables, adding the join to the database schema. The Join node can be connected to any Select (excluding Multi Select), Preparation, and Column Operation nodes. The Join function combines columns (horizontally) from 2 tables, based on matching columns and according to the selected join type.
The Join node can be used to join tables from the data source, or to join data source tables with new tables generated by data cleansing and preparation functions. For example, both the Date Range and Summarize functions generate new tables; these can then be joined to the original table from the data source. At other times, you may have tables showing similar information that might be more user friendly for data analysts if combined into one table.
Create a Join
From the Data Flow:
-
Drag the Join node onto the canvas and connect it to the two tables that are being joined.
-
Select the Join node on the Canvas, and go to the Properties panel to configure the join:
- Resulting Table Name: Name the new table.
- Join Type: Choose the join type.
- Join Columns: For each table, select the column where the join is applied.
For more information, see Join Types.
.
Copy the join script
Click Copy to copy the script that underlies this join to your clipboard. Tip: You can use this script when testing in the database.
Join Types
Choose from the following join types:
- Full Outer Join: returns all rows from both tables.
- Inner Join: returns all rows from both tables based on a matching column. Only returns rows where the matching column's values are identical.
- Left Outer Join: returns all rows from the first / left table, and matching rows from the second / right table.
- Right Outer Join: returns all rows from the second / right table, and matching rows from the first / left table.
- Cross Join: returns a result set by multiplying the number of rows in the first table by the number of rows in the second table. The result is a table that returns all possible combinations of all rows from both tables.
Examples
In this example, we have 2 tables containing information about manufacturers: Manufacturer Details (red highlight below) and Manufacturers (green highlight).
We want to combine the columns in each of these 2 tables to produce 1 table:
To achieve this, we add the Join node to the Data Flow (ETL) and connect it to both tables using a left outer join (blue arrow below), making sure to connect the tables in the order we want to merge them.
This produces the following results set, which includes 2 identical Manufacturer columns (red), one from each table:
Finally, to show only one Manufacturer column, we need to open the Column Selection window from the Properties panel and deselect one of Manufacturer columns (red highlight below), leaving us with the required results set:
In this example, the Summarize node was added to the table to produce new columns based on summarize functions:
The new Summarize columns were then joined to the original table by connecting the Join node to both the Select table and the Summarize table, and adding a right outer join: